Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply filtering to text attachments; offer to auto-upload text attachments to paste bin #3241

Merged
merged 13 commits into from
Jan 30, 2025

Conversation

swfarnsworth
Copy link
Member

This is a work-in-progress intended to allow us to lift the embargo on text file attachments. Whether we use this functionality, or anything like it, is subject to further discussion. I'm submitting it as a draft PR to expose what I've done so far.

Allowing text file attachments is intended to remove friction in making a user's code available to question-answerers during help sessions. If a user uploads their py file, and it's deleted by the bot, the user might become confused and not follow the instructions to use the paste bin. Instructing users on how to use the paste bin can also be grating for question-answerers, especially when they would have been able to read the py file in the file embed that appears in the desktop client, which provides syntax highlighting.

However, the file embeds do not appear in the mobile version of Discord. Our filtering system also does not consider the content of text-based file attachments. This PR implements a solution for the former and the beginnings of a solution for the latter.

Offer to auto-upload text attachments to the paste bin

This functionality, which is already fully implemented, is intended to benefit mobile users who can't see the text file embeds, while upholding our promise not to copy user content to an external system (ie, our paste bin) without their permission. It consists of these steps, following a user sending a message containing text attachments:

  • The bot replies to the message, asking the user to react with a checkmark to have the attachments automatically uploaded to the paste bin.
  • If the user complies within a fixed duration, the bot does so, and replaces its own message with a link to the paste bin.
  • The bot DMs the user the delete link for the paste.
  • Alternatively, the bot also waits for the user to react with a trashcan emoji, deleting the paste. This option times out, but the delete link sent via DMs does not.

Apply token filters to text attachments

This one is much trickier. The simplest solution I could think of (after trying a few more complicated ones) is to simply treat the content of text attachments as a continuation of the message itself, though this could make it not-immediately-obvious to moderators when a filter is triggered by an attachment rather than the true message content. We also don't want to waste compute time applying the filters to especially long text attachments (which is relatively likely for CSV and JSON attachments).

We might decide on a constant size limit for text-based attachments and modify the extension filters to consider both the file extension and the size of the file, while also not appending attachment content to the message if the file size exceeds that same constant. We might also not apply the token filter to CSVs and JSONs, since those are usually posted as part of a minimal reproducible example and are probably not intended to be read.

…ion to auto-upload to pastebin.

Also DMs the delete URL to the user. This code will very likely be moved elsewhere before/if it is merged.
Works by appending text attachment content to message content, and then applying the filters normally.
@swfarnsworth swfarnsworth added a: moderation Related to community moderation functionality: (moderation, defcon, verification) review: do not merge The PR can be reviewed but cannot be merged now labels Jan 19, 2025
@swfarnsworth swfarnsworth requested a review from mbaruh as a code owner January 19, 2025 17:50
Implements a somewhat arbitrary limit on how much text content is passed along for filtering, to avoid wasting compute time on large attachments that aren't intended to be read (such as CSVs)
Copy link
Member

@lemonsaurus lemonsaurus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentatively approved. The feature looks solid, but I have stylistic concerns.

  • Classes, methods and coroutines should have docstrings
  • The cognitive complexity (e.g. mccabe) of on_message is too high. It's too deeply nested, too long, and doesn't break the code into logical blocks with helpful block comments headlining them as a longer piece of code ought to. I'd like to see a refactor here to move helper functions out into the class, and perhaps break up the different pieces into a few function calls with helpful names. I think generally that it's better that on_message itself is aggressively short and very easy to read and understand, while the implementation details are abstracted away into other functions.
  • Somewhat inconsistent adherence to PEP257. Comments should start with capital letters and end with punctuation, and should be imperative mood.

Looks like a great addition to the bot, though <3 thank you for writing it! I can't wait to see it merged.

@@ -20,7 +28,7 @@
f"please use a code-pasting service such as {PASTE_URL}"
)

TXT_LIKE_FILES = {".txt", ".csv", ".json"}
TXT_LIKE_FILES = {".txt", ".csv", ".json", ".py"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're lifting the ban on .py, then anything adjacent feels like it might as well also be added. Maybe .js, .css, .html so that anyone using Python to build a webapp of some sort won't run into a brick wall? What about stuff like .yaml, .toml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized later that that code has no effect on which file types are allowed.

self.bot = bot

@staticmethod
async def _convert_attachment(attachment: discord.Attachment) -> paste_service.PasteFile:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coroutines deserve ✨d o c s t r i n g s ✨

bot/exts/filtering/_filter_lists/extension.py Outdated Show resolved Hide resolved
bot/exts/filtering/_filter_lists/extension.py Outdated Show resolved Hide resolved
Previously, the bot might have offered to upload the attachments in a message containing only images, and then done nothing.
… paste.

The expanded initial message tells the user that uploading to the paste bin is for accessibility.

I hallucinated that PasteResponse objects have a delete method, which they do not.
…cts.

Messages might be deleted immediately if the message or the attachment trips a filter, in which case we don't want the user to be able to upload them.
@swfarnsworth
Copy link
Member Author

@lemonsaurus thanks for your feedback. I thought I was in the finishing-touches stage for this, but there's actually still more to be done. I'm pushing what I've worked on this evening for additional scrutiny/ridicule.

These files will be made allowed. Also move `TXT_LIKE_FILES` to the other module that uses it.
…rect encoding.

Attachments with "charset" in their content type are presumed to be text. The specified charset is always used to decode the text.
@swfarnsworth swfarnsworth merged commit 1a8ee2c into main Jan 30, 2025
3 of 5 checks passed
@swfarnsworth swfarnsworth deleted the pastebin-auto-upload branch January 30, 2025 23:17
@swfarnsworth swfarnsworth restored the pastebin-auto-upload branch January 30, 2025 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a: moderation Related to community moderation functionality: (moderation, defcon, verification) review: do not merge The PR can be reviewed but cannot be merged now
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants